Maximally Correlated Principal Component Analysis

نویسندگان

  • Soheil Feizi
  • David Tse
چکیده

In the era of big data, reducing data dimensionality is critical in many areas of science. Widely used Principal Component Analysis (PCA) addresses this problem by computing a low dimensional data embedding that maximally explain variance of the data. However, PCA has two major weaknesses. Firstly, it only considers linear correlations among variables (features), and secondly it is not suitable for categorical data. We resolve these issues by proposing Maximally Correlated Principal Component Analysis (MCPCA). MCPCA computes transformations of variables whose covariance matrix has the largest Ky Fan norm. Variable transformations are unknown, can be nonlinear and are computed in an optimization. MCPCA can also be viewed as a multivariate extension of Maximal Correlation. For jointly Gaussian variables we show that the covariance matrix corresponding to the identity (or the negative of the identity) transformations majorizes covariance matrices of non-identity functions. Using this result we characterize global MCPCA optimizers for nonlinear functions of jointly Gaussian variables for every rank constraint. For categorical variables we characterize global MCPCA optimizers for the rank one constraint based on the leading eigenvector of a matrix computed using pairwise joint distributions. For a general rank constraint we propose a block coordinate descend algorithm and show its convergence to stationary points of the MCPCA optimization. We compare MCPCA with PCA and other state-of-the-art dimensionality reduction methods including Isomap, LLE, multilayer autoencoders (neural networks), kernel PCA, probabilistic PCA and diffusion maps on several synthetic and real datasets. We show that MCPCA consistently provides improved performance compared to other methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study of Physical and Chemical Soil Properties Variations Using Principal Component Analysis Method in the Forest, North of Iran

The field study was conducted in one district of Educational-Experimental forest at Tehran University (Kheirood-Kenar forest) in the North of Iran. Eighty-five soil profiles were dug in the site of study and several chemical and physical soil properties were considered. These factors included: soil pH, soil texture, bulk density, organic carbon, total nitrogen, extractable phosphorus and depth ...

متن کامل

Internal Traits of Eggs and Their Relationship to Shank Feathering in Chicken Using Principal Component Analysis

Chicken eggs represent an important source of protein to the growing human population and also supply repositories of unique genes that could be used worldwide. The inheritance of shank feathering trait is dominant upon non-feathering shank trait in chicken which is based on two factors: pti-1L and pti-1B that are located on Chromosomes 13, 15, and 24. Using 185 fertile eggs collected from two ...

متن کامل

Discrimination of Golab apple storage time using acoustic impulse response and LDA and QDA discriminant analysis techniques

ABSTRACT- Firmness is one of the most important quality indicators for apple fruits, which is highly correlated with the storage time. The acoustic impulse response technique is one of the most commonly used nondestructive detection methods for evaluating apple firmness. This paper presents a non-destructive method for classification of Iranian apple (Malus domestica Borkh. cv. Golab) according...

متن کامل

Co-modulatory spectral changes in independent brain processes are correlated with task performance

This study investigates the independent modulators that mediate the power spectra of electrophysiological processes, measured by electroencephalogram (EEG), in a sustained-attention experiment. EEG and behavioral data were collected during 1-2 hour virtual-reality based driving experiments in which subjects were instructed to maintain their cruising position and compensate for randomly induced ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1702.05471  شماره 

صفحات  -

تاریخ انتشار 2017